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ABSTRACT 

We present a new and simple technique for selecting extensive, complete and pure quasar samples, 
based on their intrinsic variability. We parametrize the single-band variability by a power-law model 
for the light-curve structure function, with amplitude A and power-law index 7. We show that 
quasars can be efficiently separated from other non-variable and variable sources by the location of 
the individual sources in the A-^ plane. We use ^60 epochs of imaging data, taken over r^b years, from 
the SDSS stripe 82 (S82) survey, where extensive spectroscopy provides a reference sample of quasars, 
to demonstrate the power of variability as a quasar classifier in multi-epoch surveys. For UV-excess 
selected objects, variability performs just as well as the standard SDSS color selection, identifying 
quasars with a completeness of 90% and a purity of 95%. In the redshift range 2.5 < z < 3, where 
color selection is known to be problematic, variability can select quasars with a completeness of 90% 
and a purity of 96%. This is a factor of 5-10 times more pure than existing color-selection of quasars 
in this redshift range. Selecting objects from a broad griz color box without ii-band information, 
variability selection in S82 can afford completeness and purity of 92%, despite a factor of 30 more 
contaminants than quasars in the color-selected feeder sample. This confirms that the fraction of 
quasars hidden in the "stellar locus" of color-space is small. To test variability selection in the context 
of Pan-STARRS 1 (PSl) we created mock PSl data by down-sampling the S82 data to just 6 epochs 
over 3 years. Even with this much sparser time sampling, variability is an encouragingly efficient 
classifier. For instance, a 92% pure and 44% complete quasar candidate sample is attainable from the 
above ^rz2:-selected catalog. Finally, we show that the presented A-^ technique, besides selecting clean 
and pure samples of quasars (which are stochastically varying objects), is also efficient at selecting 
(periodic) variable objects such as RR Lyrae. 

Subject headings: methods: observational - surveys - galaxies: quasars igeneral 



1. INTRODUCTION 

Large, complete and pure samples of quasars have 
proven crucial for observational cosmology. Quasars 
serve as probes of galaxy evolution, map black hole 
growth and probe (and affect) the intergalactic medium. 
Quasar clustering is a tracer of mass clustering on 
both large and small scales (Croom et al. 2005, 2009b; 
Shen et al. 2007, 2009; Ross et al. 2009), and the large 
samples provide precise measurements of the evolu- 
tion and spectral properties of the quasars them- 
selves (Vanden Berk et al. 2001; Richards et al. 2002b, 
2004, 2006, 2009; Boyle et al. 2000; Croom et al. 2009b). 
Furthermore, huge quasar samples are required to 
find a large number of gravitationally lensed quasars 
(Oguri et al. 2006; Inada et al. 2008). Through the grav- 
itationally magnified quasars the quasar samples indi- 
rectly contribute to the understanding of the molec- 
ular gas content in distant galaxies (Yun et al. 1997; 
Riechers 2007a,b), mapping of the intergalactic medium 
and structures (e.g. Metcalf 2005; Hennawi et al. 2006b; 
Hennawi & Prochaska 2007) and exploration of the dark 
matter (halo) content of galaxies (Dalai & Kochanek 
2002; Bradac et al. 2002; Dobler & Keeton 2006; Maccio 
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2008). In short, large well-defined quasar samples are a 
cornerstone of observational cosmology. 

Photometric quasar samples have recently grown 
to nearly a million objects (850,000 actual quasars; 
Richards et al. 2009). Despite these impressive catalog 
sizes the number statistics still limit the achievable sci- 
ence in various cases; especially those where particular 
and hence rare geometric constellations of quasars are 
needed. For instance a 3<j detection of a luminosity- 
dependent quasar bias above 2; >1.9 when analyzing the 
angular clustering of quasars, needs an estimated sam- 
ple size of at least 1,200,000 actual quasars (Myers et al. 
2007a,b; Myers, White & Ball 2009). Searches for bi- 
nary quasars (Hennawi et al. 2006b, 2009; Myers et al. 
2008) - which provide interesting knowledge about small 
scale clustering and hence shed light on quasar trigger- 
ing mechanisms and the nature of quasar progenitors 
- also should be based on samples with > 10^ actual 
quasars in order to obtain reasonably sized statistical 
samples of possible quasar pairs. Also quasar-galaxy 
clustering (e.g. Scranton et al. 2005; Lopez et al. 2008; 
Padmanabhan et al. 2008; Burbidge & Napier 2009), i.e. 
exploring the statistics of quasars behind the fore- 
ground galaxies, calls for larger (relatively low- 2:) quasar 
samples than exist to date. Furthermore, explor- 
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ing the "transverse proximity effect" in the Lya for- 
est of quasars, with foreground quasars near the sight 
Hue of background quasars (e.g. Hennawi et al. 2006a; 
Hennawi & Prochaska 2007) is presently limited by 
quasar sample sizes. Obtaining larger photometric 
quasar catalogs to boost possible candidates for spec- 
troscopic follow-up is needed. The ^3a detections of the 
integrated Sachs- Wolfe effect by cross correlating quasars 
with the CMB to estimate the size of cosmological pa- 
rameters and the dark energy equation of state (e.g. 
Giannantonio et al. 2008; Xia et al. 2009; Scranton et al. 
2005) will also be improved by larger photometric sam- 
ples of 1<2;<5 quasars. Last but not least, larger photo- 
metric quasar catalogs will enhance the number of known 
gravitationally lensed quasars (e.g. Oguri & Marshall 
2010). At present ~100 quasar lenses are known and an 
even larger sample of the relatively rare gravitationally 
lensed quasar systems will among other things improve 
our knowledge about cosmology, galaxy mass distribu- 
tions, quasar hosts and the growth of the host's central 
black holes (Schneider, Kochanek & Wambsganss 2006, 
and references therein). These few examples serve as a 
scientific justification for pursuing even larger photomet- 
ric samples of low as well as of high redshift quasars. 

Most existing large quasar samples haven been se- 
lected on the basis of their UV/optical colors or radio 
flux. However, quasars are known to vary intrinsically 
on timescales of months to years and can therefore be 
selected alternatively on the basis of their variability 
(alone). Several physical processes are discussed as 
important causes of this variability: foremost are 
accretion disc instabilities (Rees 1984; Kawaguchi et al. 
1998; Pereyra et al. 2006) but also large-scale changes 
in the amount of in-falling material may be important 
(e.g. Hopkins et al. 2006, and references therein). Also, 
starbursts in the host galaxy (Aretxaga et al. 1997; 
Cid Fernandes et al. 1997), micro lensing by the host 
galaxy and compact dark matter objects (Hawkins 1996; 
Zackrisson et al. 2003), and stochasticity of multiple 
supernovae (Terlevich et al. 1992) have all been pro- 
posed. Irrespective of the physics behind the variability, 
quasars are observed to exhibit brightness variations, of 
typically > 10% over several years (e.g. Giveon et al. 
1999; Vanden Berk et al. 2004; Rengstorf et al. 2004; 
Sesar et al. 2007; MacLeod et al 2008; Bramich et al. 
2008; Wilhite et al. 2008; Kozlowski et al. 2010; 
Bauer et al. 2009; Kelly et al. 2009). This variability 
has been exploited for several purposes, e.g. to estimate 
Eddington ratios and black hole masses (Bauer et al. 
2009; Wilhite et al. 2008), or simply to identify them 
(Geha et al. 2003). 

With SDSS, QUEST and OGLE (see e.g. 
Abazajian et al. 2009; Rengstorf et al. 2004; 
Udalski et al. 1997, respectively), large-scale, multi- 
epoch and multi-band surveys have emerged, and have 
been used to search for quasars. The Panoramic Survey 
Telescope & Rapid Response System 1 (Pan-STARRS 1, 
Kaiser et al. 2002) and 4 (Pan-STARRS 4, Morgan et al. 
2008), and the Large Synoptic Survey Telescope (LSST, 
Ivezic et al. 2008; Abell et al. 2009) wih take such sur- 
veys to the next level. In all these surveys, the largest 
quasar samples stem from color selection in imaging (e.g. 
Richards et al. 2002a, 2004, 2009; Atlee & Gould 2007; 
D'Abrusco et al. 2009). The characteristic so-called 



"UV excess" of quasars, their bright blue u — g color, 
is capable of separating the quasars from their stellar 
contaminants in color-color space, allowing for efficient 
selection of targets for spectroscopic follow-up (e.g. 
Strauss et al. 2002). Such UV excess color selection 
is, however, only efficient for low {z < 2.5) and high 
{z > 3) redshift quasars, since the quasar and stellar 
loci overlap in the u — g color for 2.5 < z < 3.0 objects, 
causing the selection efficiency (or purity) in that region 
to drop below 50%. For quasars with 2.6 < z < 2.8 the 
efficiency is close to 10% (Richards et al. 2006). This 
confusion reigns until the Ly-break of high-z quasars 
moves into the ^-band and again makes for unusual 
colors (e.g. Fan et al. 2001, 2006). 

Moreover, ix-band imaging is expensive: the area and 
depth of an optical imaging survey can be greatly in- 
creased by focusing on redder filters, where atmospheric 
attenuation is lower and detectors more efficient. For ex- 
ample, the Pan-STARRS 1 telescope offers the possibility 
of creating the largest sample of quasars to date with its 
multi-epoch 30,000 deg^, 3/4 sky, "Stt" grizY imaging 
survey. For the purposes of identifying quasars in this 
data set, the question remains, "can we compensate for 
the lack of ii-band data by exploiting the multi-epoch 
nature of the ^-band imaging instead?" With one eye 
on the potential of Pan-STARRS 1, we therefore explore 
here the possibilities of creating large, complete and pure 
samples of quasars based on limited color information, 
but with light curves spanning several years. We use 
Stripe 82 of SDSS (Abazajian et al. 2009) as a testbed, 
both for the method in general and for making mock PSl 
data sets. 

This paper is organized as follows. In Section 2 we 
briefly review previous attempts to characterize quasar 
variability in optical imaging surveys, and then introduce 
the SDSS Stripe 82 data sets in Section 3. We introduce 
our methodology for quantifying the variability of various 
objects via their individual power-law structure functions 
in Section 4, and show results of selection experiments 
in Stripe 82 in Section 5. After a brief discussion in 
Section 6, we conclude in Section 7. All magnitudes are 
given in the AB system. 

2. VARIABILITY CHARACTERIZATION OF SOURCES AND 
THE STRUCTURE FUNCTION 

In this section we briefly review the strengths and 
weaknesses of optical color selection, of particular 
sources, focusing on quasars. We then present our 
approach to quantifying source variability, which we 
will then explore as an additional approach to selecting 
quasars, of other sources. 

2.1. Color Selection 

The most common way to generate large samples of 
optical quasar candidates for follow-up is by specifying a 
particular region of interest in color space, as was done 
e.g. in SDSS (Richards et al. 2002a, 2006, 2009). For 
quasars at 2: < 3 the u — g color is crucial in this approach 
since it enables a photometric separation of the quasar 
candidates from the stellar locus, reducing the number 
of contaminating objects to a point where spectroscopic 
follow-up is feasible. This is illustrated in Figure 1, where 
we have plotted the median color of ~9000 spectroscop- 
ically confirmed quasars, as well as an illustrative com- 
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Fig. 1. — Projections of point-source colors from the SDSS Stripe 
82 data described in Section 3 to the ugriz color space. The 
light blue (magenta) points show z < 2.5 {z > 2.5) spectroscopi- 
cally confirmed quasars. Illustrative contaminant point sources are 
shown as grey (F/G stars) and red (RR Lyrae) points. These panels 
demonstrate the importance of the w-band data in color selection 
of quasars, with the u — g color allowing the clearest discrimination. 
This clearly shows the necessity for an alternative way to lower the 
amount of contamination when the filters are too red. The lower 
right color magnitude diagram illustrates that a cut in magnitude 
will also eliminate some contaminants, and is included for com- 
parison purposes with Figures 4, 7, 13 and 14 in Richards et al. 
(2002a). The black, green and blue boxes correspond to the UVX, 
nUVX and griz color boxes defined in Tables 1, 2 and 3 respec- 
tively. The narrow appearance of the contaminant sample in the 
upper right panel is due to our contaminants being mostly refer- 
ence stars with small photometric errors. The stellar locus spanned 
by all point sources in S82 with 15 < r < 18 is shown as the black 
contours (see also figures in Richards et al. 2002a). 

parison sample of 5000 F/G and 483 RR Lyrae stars (see 
Section 3), all drawn from the SDSS Stripe 82 photomet- 
ric catalog DR7 (Abazajian et al. 2009). The top panels 
and the bottom left panel of Figure 1 shows the distribu- 
tion of the samples in the color-color planes of the SDSS 
ugriz color cube. This clearly shows the power of the 
u — g color (upper left panel) compared to the g — 
r — i and i — z colors in separating the quasars from 
their contaminants, especially for low redshift quasars 
(i.e. z ^ 2.5 shown as light blue points). The color 
magnitude diagram in the lower right panel illustrates 
that a cut in magnitude will also eliminate contaminants. 
The contours indicate the stellar locus of Stripe 82 point 
sources with 15 < r < 18 

For higher redshift objects {z > 2.5, shown as magenta 
points in Figure 1) the quasars intersect the stellar locus 
(top left panel). The purity for 2.5 < z < 3.0 quasar 
candidate samples is around 10-50% in the color-selected 
SDSS quasar target sample (Richards et al. 2006). In 
general the color selection method is efficient for low and 
high redshift quasars, but for the intermediate redshift 
objects contamination becomes a severe problem. We 
refer to Figures 13 and 14 in Richards et al. (2002a) for 
a more complete version of Figure 1. 

With Pan-STARRS 1 (henceforth PSl) the contami- 



nation problem is even more pronounced when using the 
color selection method only. PSl has a 5-ffiter system 
consisting of SDSS-like ^, r, z, z bands (albeit with signif- 
icantly higher red sensitivity) and a Y filter. The crucial 
u — g color used in the SDSS color selection method is 
not available: the contamination of a color-selected PSl 
quasar sample will be a problem for z > 2.5 as well as 
for z < 2.5. It is therefore necessary to find a way of sep- 
arating the majority of quasars from the contaminating 
stellar locus in order to obtain a pure PSl quasar sam- 
ple. The intrinsic variability of the quasars (and their 
contaminants) is a very promising tool for doing this. 

2.2. Source variability: power-law structure functions 

The structure function characterizes the variability 
of quasars (and the other sources) by quantifying the 
variability amplitude as a function of the time lag 
between compared observations (Cristiani et al. 1996; 
Giveon et al. 1999; Eyer 2002; Vanden Berk et al. 2004; 
de Vries et al. 2005; Rengstorf et al. 2006). For any ob- 
ject the observables for estimating the structure function 
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data pairs, assuming N light curve data 



points, describing the variability as the magnitude dif- 
ference between two epochs i and j, corrected for mea- 
surement errors, i.e.. 



(1) 



Here Arriij is the measured magnitude difference be- 
tween observation i and j. The and aj are the photo- 
metric errors on the measurements and Atij is the time 
difference between the two observations. The quantity 
V is defined like this so that its average, over a large 
number of data pairs, is an estimator for the intrinsic 
standard deviation of the source magnitude. 

At this point we note that Atij usually refers to the 
time lag in the quasar rest frame. However, computing 
this requires a priori knowledge of the quasar redshift, 
and when selecting objects in imaging-only surveys, we 
do not know the object redshift. Therefore, we work with 
time lags in the observed frame. This necessary conven- 
tion differs from most of the quasar variability literature; 
we will make some comparisons in Section 4. 

In previous analyses, the average V has been calculated 
in n time lag bins using data pairs from many quasars, 
thus "stacking" the variability signal to allow the prop- 
erties of the quasar population to be probed. Then 



ViAt) = 



(2) 



where the average, ( )^^, is taken over all epoch pairs i, j, 
whose lag falls in the bin At. The same approach can be 
taken in estimating the structure function of classes of 
objects (e.g. quasars at a given luminosity and redshift 
bin) if, say, only two epochs are available per object, but 
large samples exist (e.g. Richards et al. (2006, 2009)); 
in that case the ( ) in Equation 2 becomes and ensemble 
average. Vanden Berk et al. (2004) and others find that 
the ensemble average quasar structure function appears 
to follow an increasing power law with time lag. 

On the other hand, for the case where the light curve 
sampling of each object is high, we can compute the aver- 
age V{At) for an individual object (Eyer 2002). Binning 
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the ^'^^""^^ data pairs from an object's TV-point hght 
curve gives an estimate of V{At) defined at each bin 
center. This approach is computationahy efficient, and 
provides a free-form view of the object's structure func- 
tion. However, in the case of relatively sparse sampled 
data (6 epochs over 3 years in the PSl Stt survey, see 
Section 3.7), binning the data pairs to obtain the struc- 
ture function from Equation 2 to estimate the variability 
may not be the optimal approach. 

In Equation 2, both the noise and the intrinsic pho- 
tometric variability are assumed (implicitly) to have a 
Gaussian distribution (Rengstorf et al. 2006). We can 
then extend this simple model of quasar variability to 
include a power law increase in variability with time lag. 
Drawing on the results from Richards et al. (2006) we 
propose a power-law model for the structure function 
given by 

^mod(A*..IA7) = ^(^)'. (3) 

We can then fit this model to a given set of data, 
{Arriij^ Atij), as follows. We write the likelihood for 
the power law parameters A and 7 as 

CiA,^)=l[L,j , (4) 

assuming a set of independent magnitude differences as 
our data. Here Lij is the likelihood of observing one 
particular magnitude difference Arriij between two light 
curve points separated by Atij. Following the ensem- 
ble analyses referred to above, we assume an underlying 
Gaussian distribution of Am values and Gaussian pho- 
tometric errors: 

^y = ^^— exp(-^J^), (5) 

Here, the effective (observed) variability V^gp is 

Ve'ff,,, = K,od(At., \A, 7)2 + {af + a]) , (6) 

i.e., we propagate the photometric errors cr^ and (jj by 
adding them in quadrature to the variability "error" 

Vmod{Atij\A,-f) 

This approach can yield posterior probability distri- 
butions on the two model parameters, A and 7. The 
amplitude A quantifies the root-mean-square magnitude 
difference on a 1 year timescale, while 7 is the logarithmic 
gradient of this mean change in magnitude. We assign 
uninformative priors for the parameters (uniform in the 
logarithm of A, and uniform in the arctangent of 7 - since 
7 represents the slope of a straight line), and then explore 
the posterior probability distribution for these two power 
law parameters via a simple Markov chain Monte Carlo 
(MCMC) code (Metropolis et al. 1953; Hastings 1970) as 
described in Appendix A. All we are doing is replacing 
n binned structure function parameters (the values of V 
in each of the n bins) with two parameters that define 
a power law structure function, and then inferring these 
parameters instead of constructing estimators for them. 
We will show in Section 4 that using a power law model 
for the variability actually provides a good fit. 



3. SDSS STRIPE 82: A TESTBED FOR VARIABILITY 
STUDIES 

Anticipating the results of Section 4, we note that to 
detect and quantify intrinsic quasar variability will likely 
require multi-epoch data spanning several years. Before 
surveys with facilities such as Pan-STARRS and LSST 
become available, SDSS Stripe 82 (Abazajian et al. 
2009) forms an excellent training set and methodolog- 
ical test bed (e.g. Sullivan et al. 2005; Sesar et al. 2007; 
Bramich et al. 2008; Frieman et al. 2008). In this section 
we will describe the various Stripe 82 data sets that we 
have created in order to test and illustrate the prospects 
of our algorithm. 

The SDSS Stripe 82 region (henceforth S82) covers ap- 
proximately 320 deg^, from right ascension around 290° 
to 60° in a 2.5° wide band on the celestial equator. Over 
eight observing seasons it has been repeatedly observed 
in the fall months, resulting in many epochs (typically 
^60) in each of the 5 SDSS bands. As the PSl Stt survey 
will contain fewer epochs, we can "down-sample" the S82 
object light curves to simulate observations taken with 
PSl (albeit ones at lower angular resolution and depth). 

Relative to PSl, S82 does have the advantage of 
u — g color coverage, and extensive bright object spec- 
troscopy. One can therefore construct quite pure samples 
of quasars, RR Lyrae and so on, that may serve as ground 
truth for our variability selection. 

In the following subsections we describe these various 
subsamples in some detail, and provide a brief overview 
here. We have selected all the spectroscopically con- 
firmed quasars in S82 together with a representative 
set of (stellar locus) contaminants, which contains non- 
varying (type F/G stars) as well as varying (RR Lyrae) 
point sources to illustrate our method and algorithm 
prospects. These objects' photometry data are plotted 
in ugriz color space in Figure 1. To investigate the se- 
lection of quasars by their colors, we define three color 
selection boxes and explore the objects returned by each. 
One of these mimics the more limited color selection pos- 
sible with PSl: quantifying how the variability informa- 
tion then improves the PSl quasar selection is one of the 
main goals of this paper. 

In the following subsections, we describe two prepara- 
tory steps for turning the ~ 60 epoch S82 data into a 
testbed for color plus variability based quasar selection 
in SDSS (S82) and PSl: first, we describe the defini- 
tion of various sub-sets of candidate objects; then we 
describe some technical steps "cleaning" the light curves 
and down-sampling the S82 data to mimic PSl. 

3.1. Spectroscopically- confirmed quasars in Stripe 82 

Key to designing a quasar variability selection algo- 
rithm is an understanding of the variability proper- 
ties of objects that are indeed spectroscopically con- 
firmed quasars. We have selected all of these (both 
point sources and extended objects) pubhshed in the 
SDSS DR5 quasar catalog (Schneider et al. 2007) that 
fall within S82. There are 9157 spectroscopically con- 
firmed DR5 quasars in S82, spanning a redshift range 
from 0.08 to 5.09. These quasars have 15.4 < i < 22.0 
with a mean of 19.5. See Schneider et al. (2007) for the 
corresponding numbers for the whole DR5 quasar cata- 
log. 
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To get the multi-epoch photometry (light curves) for 
the 9157 quasars we performed an SQL neighbor search 
in the S82 DR7 database, choosing a search radius of 
0.5" to minimize the light curve contamination from 
misidentified (spatial) neighbors. This search on aver- 
age yielded 60 epochs per object, after selecting only 
entries with good BRIGHT, EDGE, BLENDED, NODEBLEND, 
SATUR, PEAKCENTER, NOTCHECKED, INTERP.CENTER and 
DEBLEND.NOPEAK flags (of which the first 5 are referred to 
as fatal and the rest as non-fatal flags by Richards et al. 
(2002a) - see their Table 2 or Stoughton et al. (2002) 
Table 9 for a description of the flags). For consistency 
we applied these same flag checks on all object samples 
we drew from the S82 catalog. We describe these other 
samples below. 

3.2. Stellar locus ^'contaminants in Stripe 82 

To get a sample of typical non- variable stellar contami- 
nants we used the SDSS standard star catalog of 1.01 mil- 
lion non- variable point-source objects in S82 published in 
Ivezic et al. (2007). From that we created a set of F/G- 
star colored objects, presumably non-varying, by apply- 
ing a color-magnitude cut on the standard star catalog 
so that 0.2 < ^ - r < 0.48 and 14.0 <g < 20.2 for ah the 
objects. This is a suitable cut for F/G-stars according to 
the SEGUE team (Yanny et al. 2009) and makes them 
potential quasar sample contaminants because of their 
g — r color (see Figure 1). We took a randomly selected 
subsample of 5000 objects from this catalog and did a 
neighbor search in S82 to get multi-epoch observations 
of these contaminants. We again used a search radius of 
0.5" and again made sure that none of the flags listed in 
Section 3.1 were set. 

To be able to test whether our algorithm is able to 
separate quasars from known variable contaminants, we 
used the largest available sample of securely identified 
RR Lyrae within S82 (Sesar et al. 2010), which consists 
of 483 RR Lyrae. 

We will refer to the F/G stars and RR Lyrae catalogs 
collectively as the "contaminants" in the remainder of 
the paper. 

3.3. UV-excess (UVX) objects 

We would also like to test our ability to detect quasars 
in the absence of spectroscopic data. To this end, we de- 
fined three photometrically-selected samples of S82 ob- 
jects, whose variability properties we will explore. 

The first of these is defined by a three-dimensional ugri 
color box in which the SDSS quasar sample is complete 
for extinction-corrected i magnitudes brighter than 19.1 
(Richards et al. 2002a). This color box is given in Ta- 
ble 1, and is shown in black lines in Figure 1. Note that 
this selection uses the SDSS ii-band data: we extracted 
all point sources within S82 that obeyed these "UV ex- 
cess" (UVX) criteria. This returned a catalog of 2912 
UVX point sources. 

3.4. Non-UV excess (nUVX) objects 

As a compliment to the UVX object sample defined 
above, where the color selection is known to efficiently 
return quasars at high completeness, a catalog of "non- 
UVX" (nUVX) objects was created from a region of 
ugriz color space where color selection of quasars is 



TABLE 1 
The UV excess (UVX) color box. 
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known to have problems. The color box from which we 
selected these nUVX point sources is given in Table 2, 
and shown as a green box in Figure 1 (and also Fig- 
ure 7 of Richards et al. 2002a). In this particular color 
box, the quasar locus, containing mostly intermediate 
redshift (2.5 < 2; < 3) quasars, crosses the stellar locus. 
The color selection therefore has efficiency as low as 10% 
in this region of color space (Richards et al. 2006). In 
the nUVX color box we find 3258 objects in S82. 

TABLE 2 

The non-UV excess (nUVX) color box. 



0.6 < 


u — 


9 


< 1.5 


0.0 < 


9 - 


r 


< 0.2 


-0.1 < 


r — 


i 


< 0.4 


-0.1 < 


i — 


z 


< 0.4 




i 




< 19.1 



3.5. Quasar Candidate Color Selection without u-band 

Data 

To simulate approximately the anticipated PSl 37r sur- 
vey light curves, we defined a third color box suitable 
for a first cut of the PSl catalog. The main purpose 
of this selection (where no ix-band information is used) 
is to excise the quasar locus as it threads through the 
3-dimensional griz color space. However, part of the 
stellar locus also lies in this box. We restrict ourselves 
to right ascensions between and 20 degrees (enclosing 
a sixth of the S82 area, ~50 deg^) in order to return a 
manageable catalog of 12,714 objects. The griz box is 
indicated by the blue solid lines in Figure 1 and is de- 
fined in Table 3. The magnitude cut of 19.1 is chosen 
to allow straightforward comparison with the UVX and 
nUVX boxes. 



TABLE 3 
The griz COLOR box. 



-0.2 < 


g-r 


< 0.9 


-0.2 < 


r — i 


< 0.6 


-0.15 < 


i — z 


< 0.5 


17 < 


i 


< 19.1 



3.6. Eliminating light curve "outliers^^ in Stripe 82 

Plotting the complete S82 multi-epoch photometry 
output for the various objects revealed some outlying 
points that were several magnitudes fainter than adja- 
cent flux points (see Figure 2); only some of these out- 
liers were found to be caused by image defects. However, 
we assume that such a significant decrease in magnitude 
in a single observation must be non-physical. We there- 
fore removed the outliers (irrespective of their origin) 
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19.5F 




10 20 30 40 50 

Observotlon number (ordered by MJD) 



Fig. 2. — Multi-epoch photometry output from SDSS 
Stripe 82 for the spectroscopically confirmed quasar SDSS 
J203817.37+003029.8, shown in the g (green squares), r (blue tri- 
angles) and i (red circles) bands. For a handful of epochs, the 
output magnitudes appear spuriously faint; their inclusion would 
severely affect the calculation of a variability structure function. 
Over-plotted are the corresponding medianized light curves used 
to remove the outliers (open symbols) from the raw multi-epoch 
output. The dotted lines are plotted to guide the eye and are 
spaced by half a magnitude from 20 to 21.5. In the lower panel 
the residuals between the medianized light curve and the photo- 
metric measurements for the three bands are shown, with the pho- 
tometric errors over-plotted. The limit used to remove the outliers 
(|LC — LCj^g^l > 0.25) is indicated by the horizontal dashed lines. 

by running a median filter on the photometric measure- 
ments. Measurements with a residual between the medi- 
anized light curve and the photometric data larger than 
0.25 magnitudes were removed. In Figure 2 we show the 

r and z-band multi-epoch photometric measurements 
(open symbols indicating the removed measurements) 
with the corresponding medianized light curves over- 
plotted for quasar SDSS J203817.37+003029.8. The bot- 
tom panel shows the residuals, with the limit of 0.25 mag- 
nitudes indicated by the dashed lines. As is the case here, 
in general, only a small fraction of the epochs is removed. 

It is these cleaned multi-epoch measurements, where 
the outlying observations have been removed (i.e. the 
filled symbols in Figure 2), we use in the determination 
and exploration of the objects' variability. 

3.7. Down- sampling S82 light curves to the PSl cadence 

In order to explore quasar selection in the con- 
text of the PSl 37r survey, we down-sampled the 
S82 data to mimic the planned PSl observations 
(Chambers & Dennau 2008, shown schematically in Ta- 
ble 4). We assumed 3 observing seasons for PSl, with a 
duration of 155 days (covering all filters) each. Only S82 
objects with more than 7 epochs in each (SDSS) season 
were passed to the actual down-sampling routine: ~1% 
of the quasars, < 0.1% of the F/G stars and ~20% of 
the RR Lyrae did not satisfy this criteria. 

We down-sampled the S82 light curve data by match- 
ing each season of observations for the ^, r, z and z-band 
with the (approximately) correct time intervals between 
consecutive observations in each band. No color infor- 
mation went into the down-sampling. After identifying 




0.01 I .1 . ■ ■ 1 .1 1 I 

0.01 0.10 1.00 10.00 

Time / yr (observed frame) 

Fig. 3. — Variability structure functions (solid lines) for SDSS 
J203817.37+003029.8, based on the photometry shown in Figure 2 
for the g (green), r (blue) and i (red) band, computed from Equa- 
tion 2. The best-fit power law (Equation 3) model for the structure 
function from the MCMC simulated annealing code (Appendix A) 
are over-plotted as dashed lines. The corresponding A and 7 pa- 
rameters of the power law and their estimated errors are quoted 
in the lower right corner of the plot. Calculating similar structure 
functions by means of the power-law model for the 9157 Stripe 82 
quasars give the median quasar sample structure function shown 
in the top panel of Figure 4. 

6 suitable S82 epochs in each band we removed all other 
observations, providing a set of mock PSl data. 

4. POWER-LAW STRUCTURE FUNCTIONS FOR SOURCES 
IN STRIPE 82 

In Figure 3 we show the binned structure functions 
(Equation 2) from the g^ r and i-band light curves of 
quasar SDSS J203817.37+003029.8 (Figure 2). Calculat- 
ing binned structure functions for individual S82 objects 
is a simple way of quantifying the variability of each ob- 
ject if there is a large number of epochs available. How- 
ever, Figure 3 suggests that we might indeed be justified 
in further compressing the structure function into a two 
parameters power-law fit, as proposed in Section 2.2. 

Before doing so, we explore wether this power law 
behavior is present in an average sense. In Figure 4 
we show the median sample structure function, created 
by median-combining all the individual binned structure 
functions calculated with Equation 2 separately for the 
well sampled S82 quasars (Section 3.1), F/G-stars (Sec- 
tion 3.2) and RR Lyrae (Section 3.2 and Sesar et al. 
2010). The shaded regions in Figure 4 around the 
medians enclose 68% and 95% of the individual struc- 
ture functions. Figure 4 shows that the quasar sam- 
ple median of the binned structure function closely re- 
sembles a power law with A = 0.093 ± 0.0002 and 

7 = 0.43 ±0.002, in agreement with findings elsewhere in 
the literature (Vanden Berk et al. 2004; Rengstorf et al. 
2006; Wilhite et al. 2008; Bauer et al. 2009). In particu- 
lar, a value of the slope of the sample median structure 
function of 7 = 0.43 agrees well with most of the liter- 
ature (see e.g. Table 4 in Bauer et al. (2009) for a brief 
overview). Rough estimates of the 1-year observed frame 
power law amplitudes in the literature give amplitudes 
between 0.10 and 0.14 (depending on the assumed mean 
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TABLE 4 

A SCHEMATIC OVERVIEW OF THE PLANNED PSl Stt SURVEY SCHEDULE. 



Date (days) 


-60 




. -30 . 




5 10 . . 


. 30 35 40 . . 


. 60 




90 


Band 


z 




. Y . 




i r g . . 


i r g . . 


. Y 




z 


Moon 


F 


. N . 


F 


. N . 


. F . . N . 


. F . . N . 


. F . 


. N . 


. F 



Notes: Date is calculated with respect to the first i-band measurements. Band shows the band observed. Moon indicates whether the 
moon is full (F) or new (N). The columns are spaced by five days. 



0.100 r 
0,01 o[- 

c 
O 

0.001 

c 

0) 0.100 

D 

% 0.010 
0.001 
> 0.100 

0.010 
0.001 




QSOs 

0.093 +/-0.0002 
0.43 +/-0.002 




RR Lyrae 

Sesor et ol.(2009) 
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Time / yr (observed frome) 
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Fig. 4. — The sample-median r-band binned structure function 
for the quasars (top), F/G stars (center) and 483 RR Lyrae (bot- 
tom) in Stripe 82, calculated using Equation 2. The power-law 
nature of the three samples is clearly seen, by the approximate 
straight lines the structure functions trace. The power law pa- 
rameters A and 7 obtained by fitting the quasar sample structure 
function with the 2 parameter power law model 3 are shown in 
the top panel. The shaded regions around each structure function 
indicate the 68% and 95% scatter around the median value. 

redshift of the samples), in good agreement with our es- 
timate for A of 0.093 mag. 

Figure 4 also shows that the sample structure func- 
tions of contaminants are also well described by power 
laws, but with small values of 7 (i.e. they show no long- 
term growth in their variability). Note that the F/G- 
stars, chosen to be no n- variable, have a variability am- 
plitude of '-«^0.04 mag. The RR Lyrae variability, when 
sparsely and randomly sampled in S82, looks like white 
noise (I7I <C 1) with an amplitude ~0.2 mag. Thus, the 
use of a power-law model of the form given in Equation 3 
would seem to be a fairly good assumption, and differ- 
ent types of objects may differ both in amplitude and in 
slope of their structure function. 

In Figure 3 the power law fits to the three (binned) 
structure functions are shown as dashed lines. The A 
and 7 with their estimated errors are given in the lower 
right corner of the plot. 

5. RESULTS 

Having defined different sub-samples of sources, and 
having shown that we can sensibly quantify their light 
curve characteristics by a power-law structure function 
model, we now proceed to characterize each of these 
sources by their best-fit parameters A and 7. 

As opposed to the earlier SDSS analysis 
(Richards et al. 2002a, 2006, 2009) the improved 



time sampling of the S82 (and even PSl) surveys, 
enables us to investigate the distributions of the A and 
7 parameters for the individual sources, not only for 
ensembles. 

5.1. The A-j Distribution 

Figure 5 shows the distribution of the variability char- 
acteristics, quantified by the best-fit A and 7 for all the 
spectroscopically confirmed quasars, and for the "con- 
taminant" F/G stars and RR Lyrae, as described in Sec- 
tion 3, based on their r-band S82 light curves. The his- 
tograms along the axes of the two dimensional scatter 
plot show the projected parameter distribution of the 
quasars and of the contaminants. Visual inspection of 
Figure 5 alone shows how well the spectroscopically con- 
firmed quasars separate from the (stellar locus) contam- 
inants in this space, demonstrating that the power-law 
structure function fit from a single band is an efficient 
classifier for data of this quality ('^60 epochs). 

The analogous A-j distributions for the much sparser 
PSl-like sampling of the r-band measurements (Sec- 
tion 3.7) are shown in Figure 6: these parameter esti- 
mates are based on only 6 epochs of photometry over 3 
years, rather than the ~ 60 in the full S82 survey. The 
separation of the quasars and the contaminants is less 
clean with the PSl sampling, but one nevertheless clearly 
sees a quasar-dominated region with rather low contam- 
ination. Plots analogous to the ones shown in Figures 5 
and 6, but for the i and z-band measurements, show 
that on average A decreases by 30-50% going from g to z 
band, whereas 7 is unchanged with varying wavelength. 
Thus, the separation of the quasars from the contam- 
inants via their 7 values appears to work comparably 
well in all 4 bands (with somewhat more scatter in the 
z-band). In agreement with Kozlowski et al. (2010), no 
clear difference in the ratio of the amplitudes at different 
wavelengths between RR Lyrae and quasars is detected. 

Since most F/G-stars should not vary, but RR Lyrae 
do, they should have different A distributions. This is 
seen in Figure 5 and 6: RR Lyrae have magnitude ampli- 
tudes above ^0.1 (e.g. Soszynski et al. 2003; Sesar et al. 
2010), while the F/G-stars have characteristic values of 
A ~ 0.01. It is therefore clear that our approach can also 
separate RR Lyrae from stellar (non-varying) contami- 
nants without doing a full fit of a periodic light curve. 

5.2. Completeness and Purity 

To move beyond a merely qualitative assessment of the 
separability of the quasars from contaminants we now 
estimate the achievable completeness and purity of the 
resulting sample. Purity is the more difficult quantity 
to estimate as it requires appropriate abundances for the 
contaminants. We do not have these for the training set 
parent quasar, F/G star and RR Lyrae samples, where 
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that define a quasar selection box, and then quantify its 
performance. Specificahy, our fiducial quasar selection 
region is bounded by the following three straight lines: 



0,10 

A (omplitude at 1 yr) 

Fig. 5. — Distribution of the variability structure function pa- 
rameters A and 7 (Equation 3) for ~15,000 individual objects in 
Stripe 82. The spectroscopically confirmed quasars are shown as 
light blue points; confirmed RR Lyrae and color-selected F/G stars 
are shown in red and grey respectively. A separation of the quasars 
from the stellar locus contaminants is clearly seen. The three solid 
lines (Equations 7-9) define the region in which we estimate the 
quasar completeness c of our algorithm (which turns out to be 93% 
in this case, Table 5). Along the axes we show the projected A and 
7 distributions for the sub-samples. 




0,01 



0,10 

A (omplitude at 1 yr) 



0.0 0.1 0.2 



Fig. 6. — Distribution of the variability structure function pa- 
rameters, similar to Figure 5, but after down-sampling to the 6 
epochs, expected PSl Stt survey cadence (Section 3.7). A variabil- 
ity separation of quasars and their contaminants is again apparent, 
albeit not as clearly as in Figure 5. The selection region is the same 
as in Figure 5; the completeness is given in the upper right corner 
of the plot, and in Table 5. 



the ratio of quasars to contaminants is 2:1, instead of 
a more realistic ~1:25, and therefore can only estimate 
completeness when working with these samples. How- 
ever, by design the UVX, nUVX and griz selection boxes 
(Sections 3.3-3.5 and 5.2.1-5.2.3) give parent samples 
with the correct quasar-contaminants ratio: we use these 
to explore the purity of our quasar selection algorithm. 
For the present we divide the A-j plane by simple cuts 



7(A) = 0.5 *log(A) +0.50 
7(A) = -2*log(A) -2.25 
7(A) =0.055 



(7) 
(8) 
(9) 



These cuts are shown as black solid lines in Figures 5-9. 

This can be thought of as a way of providing lower lim- 
its on the available completeness and purity, that a more 
sophisticated selection procedure would improve upon. 
One could of course tweak the cuts in Equations 7-9 to 
explore the trade-off between purity and completeness, 
which we have only done here "by eye." 

Applying these cuts to the data leads to the complete- 
ness given in Table 5 for the "QSO+contam" catalog. 
The completeness is calculated as the fraction of spec- 
troscopically confirmed quasars in the sample that fall 
within the cuts. In our simple illustrative setup we have 
a completeness of 93% for the quasars in the case where 
the time sampling is equal to Stripe 82 (~60 epochs). In 
the case of a PSl-like time sampling the completeness 
drops to 76%. 

Of the RR Lyrae 97% and 83% (for the S82 and PSl 
sampling respectively) fall in the high-A low-7 corner be- 
low the line given by Equation 7. In this region <0.5% 
of the 5000 F/G-stars lie, illustrating quite a clean sep- 
aration between RR Lyrae and non-varying stellar con- 
taminants. 

Besides calculating the overall completeness, we also 
split our data into redshift bins. The completeness of the 
quasars is rather constant as a function of redshift, with 
a minor loss of about 5-10% for redshifts above 4. Since 
our training set only contains 52 quasars at z > 4, we 
were not able to investigate this decreased completeness 
further. 

In the next three subsections we proceed to explore 
the A-j distributions for the samples of objects that 
were selected only on the basis of their colors (Table 1, 2 
and 3). For those samples, we estimate the completeness 
(as above) and also the purity, defined as the fraction of 
known quasars compared to the total number of objects 
inside the A-j selection regions. 

5.2.1. Quasars in the UVX catalog 

For the UVX object sample (Section 3.3) there is 
enough spectroscopy in S82 to define a spectroscopi- 
cally confirmed quasar subsample, a reference catalog of 
quasars which is complete in S82. By combining the spec- 
troscopically confirmed quasars in S82 (Section 3.1), with 
the objects from the 2SLAQ (2-degree field SDSS lumi- 
nous red galaxies and QSO) survey (Groom et al. 2009a), 
our final quasar reference catalog contains 11216 indi- 
vidual quasars (9157 from SDSS and 2059 from 2SLAQ). 
We only selected objects flagged as "QSO" in the pub- 
licly available 2SLAQ data.^ 

Matching this quasar reference catalog with the catalog 
of UVX S82 point sources returned 2140 quasars out of 
the 2912 objects in the UVX catalog. Thus, 73% percent 
of the point sources in the UVX color box are known 
quasars. It is not surprising that the fraction is so large, 

^ http://www.2slaq.info/ 
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TABLE 5 

The completeness c and purity p of the SDSS Stripe 82 (S82) and mock PSl variability-selected object catalogs. The Sc 

AND Sp INDICATE THE POISSON ERROR ON C AND p. 



Parent Catalog 


Sampling 


c 


6c 


P 


Sp 


Quasar Reference Catalog 


QSO+contam. 


S82 


93% 


1% 






SDSS 


QSO+contam. 


PSl 


76% 


1% 






SDSS 


uvx 


S82 


90% 


3% 


95% 


3% 


SDSS+2SLAQ 


uvx 


PSl 


73% 


2% 


92% 


3% 


SDSS+2SLAQ 


nUVX 


S82 


90% 


15% 


96% 


16% 


Visual 


nUVX 


PSl 


65% 


12% 


32% 


5% 


Visual 


griz box 


S82 


92% 


6% 


92% 


6% 


Visual & SDSS+2SLAQ 


griz box 


PSl 


75% 


6% 


30% 


2% 


Visual & SDSS+2SLAQ 



since we used the powerful u — g color in the definition 
of the color box. This simply re-affirms that the UVX 
box is a region of color space where the quasars are in 
the majority, as they are well separated from the stellar 
locus by the u — g color; it is exactly this separation to 
which we are trying to find an alternative. 

We estimated A and 7 for the entire UVX catalog, 
using both the full (S82) sampling and the sparser PSl- 
like version of it. The result is shown in the top row 
of the A-j plots in Figure 7. Applying our simple vari- 
ability selection cuts (Equations 7-9) returned 2033 and 
1734 quasar candidates for the S82 and PSl sampling, 
respectively. Matching these objects with the 2140 know 
SDSS-f-2SLAQ quasars in the UVX catalog returned 1935 
and 1573 matches. Thus we are able to detect the UVX 
quasars with a completeness of 90% and a purity of 95% 
(1935 matches/2033 candidates) when using the S82 time 
sampled data. For the PSl-like sampling of the data we 
get a completeness of 73% and a purity of 92% (Table 5). 

5.2.2. Quasars in the nllVX catalog 

The catalog of the UVX point sources was deliberately 
chosen from a region of color space where the color selec- 
tion already does a superb job finding quasars, as con- 
firmed by the complete SDSS and 2SLAQ quasar catalog 
and the completeness and purity of our A-j approach. 
However, one might argue that in this case (of UVX 
quasars) a light curve analysis adds little. To explore 
the A-j approach further we applied it to the non-UV 
excess objects described in Section 3.4. 

In the nUVX color box (Table 2) there is no simple 
way to quantify the completeness of the parent sample of 
color-selected objects, since we do not know how many 
quasars were missed in this region of color space dur- 
ing the SDSS survey. To try to quantify this, we ex- 
tracted the spectra from the 973 objects in this catalog 
that were targeted for SDSS spectroscopy. By visually 
inspecting these spectra we were able to compile a cat- 
alog of 77 quasars among the S82 nUVX objects. (Of 
these 77, 6 quasars are not in the SDSS+2SLAQ cata- 
log). This means that if the SDSS fibers had been allo- 
cated to nUVX objects randomly, then the purity of the 
nUVX quasar sample would be 77/973 = 8%. However, 
in practice the fibers were placed according to a Bayesian 
ranking that made fuller use of the color information, so 
that this 8% is likely an upper limit on the purity of 
the nUVX quasar sample. (The fraction of quasars hid- 
den in the un-targeted nUVX objects is likely lower than 
the fraction of quasars found in the nUVX objects with 
spectra.) 

Applying our A-j analysis and our variability selection 



criteria to the spectroscopic sub-catalog of 973 nUVX 
objects returned 72 (178) quasar candidates for the S82 
(PSl-like) time sampling. The nUVX objects' variabil- 
ity parameters are plotted in the bottom row of Figure 7. 
Estimating the completeness and purity in the S82 sam- 
pling case, assuming that the 973 objects with spectra are 
a random subset of all the nUVX objects, gives that we 
are 90% complete and 96% pure (Table 5). By the same 
argument as above, the purity is an upper limit on the 
overall purity of the nUVX quasar sample whereas the 
completeness is exact. The purity and the completeness 
stand on their own in quantifying our ability to recover 
the spectroscopically confirmed quasars. Thus, the ad- 
dition of the variability information enhances the purity 
to 96% instead of 8%, as is the case for the pure color 
selected sample. In the case of the sparser PSl-like sam- 
pling we still have a completeness of 65%, and a purity of 
32%. This clearly demonstrates that a variability-based 
approach is very efficient at selecting nUVX quasars 
when the data is well sampled in time. Even with the 
sparse PSl sampling, the purity increases by a factor of 
four when variability information is used. 

The plot of the S82-sampled nUVX objects (lower left 
corner of Figure 7) shows a clear bimodality of the A 
parameter distribution of the unknown (red) objects. As 
seen with the object training set, this bimodality is a 
probable separation between the non-varying contami- 
nants and the varying (possible RR Lyrae) contaminants. 
Thus the nUVX objects have been separated into quasar 
candidates (high 7; intermediate A), RR Lyrae candi- 
dates (low 7; high A) and non- varying stars (low 7; low 
A). 

5.2.3. Quasars in the griz Color Box 

To simulate quasar candidate selection without ii-band 
photometry, we applied our variability analysis to the 
objects lying in a fairly large multi-color region in griz 
space, the so-called griz box defined in Section 3.5. This 
color box fully contains an important part of the stellar 
locus. Estimating the completeness and purity of our 
algorithm for the griz box is difficult, as no clear esti- 
mates of the abundance of quasars exist for such a color 
cut. We therefore checked the quasar candidates against 
a catalog of the SDSS+2SLAQ quasars, plus the 6 ex- 
tra quasars found via the visual inspection of the nUVX 
spectra. This may fall considerably short of a complete 
sample of quasars, but at the moment it is the best we 
can do; the purities calculated in this section are there- 
fore lower limits. Matching this quasar reference cata- 
log to the 12,714 objects in the griz box we found 443 
known quasars. Estimating A-j for all these sources, re- 
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Fig. 7. — Distribution of variability structure function power law parameters A and 7 measured for the objects in the UVX and nUVX 
catalogs of Table 5. The left plots corresponds to the catalogs with a Stripe 82 time sampling, while the right plots correspond to the 
sparser PSl time sampling (6 epochs over 3 years). The three solid lines in each scatter plot correspond to the quasar variability selection 
region defined by Equations 7-9. The completeness and purity estimates are shown in the upper right corner of each scatter plot, and in 
Table 5. In the upper left corner the total number of objects in the catalog is stated. The light blue points indicate the known quasars, 
and the red points all other objects. Along the axes of the two dimensional scatter plots the projected probability distributions for A and 
7 are shown as histograms. 



turned 442 (1,118) variability-based quasar candidates, 
when considering the S82 (PSl) samphng and when ap- 
plying the ^4-7 cuts of Equations 7-9. Of these candi- 
dates 407 (333) were found to be known quasars. Thus, 
for the broad griz color pre-selection, variability selec- 
tion achieves 92% (75%) completeness and a purity of 
92%(30%) for the S82 (PSl) sampled data, respectively. 
The A and 7 distributions for the griz box selected ob- 
jects are shown in the top panel of Figures 8 (S82 sam- 
pling) and 9 (PSl sampling). 

In the bottom panel of Figure 8 and 9 we have pro- 
jected our variability selected quasar candidates back 
into ugr and gri color space, in order to understand the 
color distributions of variability selected quasar candi- 
dates. The figures show that 86% and 98% of the not 
spectroscopically confirmed candidates fall on the gri 
stellar locus, defined as the (blue) contour level contain- 
ing 95% of the stellar locus objects in S82. This suggests 



that many, if not most, of these unconfirmed quasar can- 
didates are stars scattered into our variability-selection 
region. However, there are some possible quasars among 
the unknowns judging from their colors. For instance, 
a few of the unknown objects (6% and 1% in the S82 
and PSl sampled case respectively) fall in the ugr UVX 
selection box. This illustrates that our purity estimates 
are lower limits but close to the likely truth. It also 
shows that the ugriz quasar color selection in SDSS 
(Richards et al. 2002a, 2006, 2009) has done an excellent 
job, implying that < 10% of the quasars with i < 19.1 
are "hiding" in the stellar locus and have been missed by 
the SDSS selection. 

When defining the griz box in Section 3.5 we included 
the stellar locus to achieve as high a completeness as 
possible and to illustrate the prospects of our approach. 
However, removing objects falling within the stellar lo- 
cus, could greatly enhance the purity at a modest reduc- 
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Fig. 8. — The top panel shows the A and 7 power law parameter space of the objects in the griz selection box (Sections 3.5 and 5.2.3) 
with Stripe 82 time sampling. The projected probability distributions for A and 7 for the confirmed quasars (light blue points) and the 
unknown objects (red points) are shown as histograms. The solid lines indicate the selection cut defined in Equations 7-9. The numbers 
12,217, 442 and 55 in the A-7 plane indicate the number of objects in the given region. The estimated completeness and purity is shown 
in the upper right corner of the scatter plot. The bottom row shows the 442 variability selected quasar candidates (407 confirmed quasars 
and 35 unknown candidates) from the A-7 space, projected back into ugr and gri color space. The black, green and blue boxes correspond 
to the UVX, nUVX and griz selection boxes (Sections 3.3-3.5 and 5.2.1-5.2.3). 87% of the confirmed quasars and 6% of the unknown 
candidates fall in the black UVX box in ugr color space (lower left plot). 5% and 6% of the quasars and unknowns fall in the green nUVX 
box. The contours in the bottom plots indicate the Stripe 82 stellar locus. 95% of the stellar locus objects are within the blue contour level 
(the last but one outer contour). Above 80% of the 35 unknown candidates and 41% of the 407 confirmed quasars fall within the blue gri 
contour, providing an estimate of the number of quasars "hiding" in the gri stellar locus. 
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0,01 OJO 0.0 0,2 0,4 

A (omplitude at 1 yr) 




Fig. 9. — Plots similar to those of Figure 8 for the griz selection box data down-sampled to a PSl cadence (6 epochs over 3 years). The 
numbers 11,420, 1,118 and 169 in A-j space (top panel) indicate the number of objects in each of the three regions defined by the black 
solid lines as defined in Equations 7-9. The estimated completeness and purity of the 1,118 variability-selected quasar candidates is shown 
in the upper right corner of the top panel. Of these 1,118 objects, 42% of the confirmed quasars and 98% of the unknown candidates fall 
within the blue gri color space (lower right plot) stellar locus contour level, which encloses 95% of the Stripe 82 stellar locus objects. 87% 
of the confirmed quasars fall in the black UVX selection box in ugr color space (lower left plot), as opposed to only 1% of the unknowns. In 
the green ugr nUVX box fall 5% and 1% of the quasars and unknown candidates. These percentages indicate that most of the unknowns 
are stars scattered into our selection region due to the 6 epoch sampling of PSl. 
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tion of the completeness. Thus, selecting quasar candi- 
dates without ii-band information can be put into four 
scenarios: 

1) A "naive" griz box {i < 19.1) including the stellar 
locus and not considering variability information 
will have a quasar selection completeness above 
90%, but a purity of only 4%. 

2) Taking a griz box color selection, but cutting out 
the stellar locus (defined by the (blue) contour in 
Figures 8 and 9) and still not considering variabil- 
ity information improves the purity to ~48% but 
lowers the completeness to ~59%. This is what 
would be easily achievable in a single-epoch griz 
survey. 

3) Combining a griz color selection including the stel- 
lar locus, but considering variability information 
(illustrated in Figures 8 and 9) leads to a com- 
pleteness of 92% (75%) and a purity of 92% (30%) 
for the S82 (PSl) sampled data, respectively. This 
means that PSl can provide a 75% complete quasar 
sample, if 30% purity were acceptable. 

4) Finally, removing the stellar locus from the griz 
selection box and combining this with variability 
information lowers the completeness to 54% (44%), 
but returns a purity of 97%(92%) with S82 (PSl) 
time-sampled data. PSl can provide a high purity 
quasar sample that is ~50% complete. 

This illustrates how a variability selection cut greatly 
improves the candidate sample as compared to quasar 
candidate selection based on colors alone when ii-band 
information is not available. Keeping in mind that the 
purities calculated here are lower limits, these results sug- 
gests that the variability selection of quasar candidates 
in PSl or other multi-epoch surveys will produce high 
quality, extensive samples. 

6. DISCUSSION 

In this paper we have explored the selection of quasars 
by their variability in lieu of their UV excess in the 
context of the PSl Sir survey as a test case for vari- 
ous upcoming multi-epoch surveys. Given its extensive, 
multi-year time sampling, SDSS Stripe 82 is an excellent 
test bed, that allows us to address variability selection of 
quasars (and other sources) in general. Besides PSl and 
SDSS, LSST (Ivezic et al. 2008; Abeh et al. 2009) and 
Gaia (Jordi et al. 2006) will be able to employ variabil- 
ity selection similar to that which we have presented. 

For any source, we have characterized its variability 
by a simple power law model for its light curve struc- 
ture function, which is similar to (but not the same as) 
MacLeod et al (2008); Sumi et al. (2005); Eyer (2002); 
Cristiani et al. (1996). For each object the amplitude A 
is the typical variability within 1 year, and the expo- 
nent 7 describes how the expectation value for the mag- 
nitude differences changes with the time between mea- 
surements. Applying this simple variability characteriza- 
tion to spectroscopically confirmed quasars, to presum- 
ably non- variable sources (F/G stars) and to known RR 
Lyrae, we have found that the A-j space separates these 
source classes nicely. Quasars have 0.07 < A < 0.25 and 



0.15 < 7 < 0.5; RR Lyrae have A ~ 0.2 and 7 ~ 0, as 
the rapid periodic variations in their light curves appear 
as "white noise" with no secular trend when coarsely 
sampled on year- long time scales; finally, non- variable 
sources have A < 0.05. The variability properties for 
quasars derived here for individual objects with ~60- 
epoch light curves, are consistent with those derived by 
for instance Vanden Berk et al. (2004) and Bauer et al. 
(2009) using ensemble averaging. The results allowed the 
definition of a simple variability-based quasar selection 
in multi-epoch data, using cuts in the A-j plane. As we 
summarize quantitatively below, this variability selection 
works very well when considering 60-epoch light curves 
in S82, and still works quite well for the sparser sampling 
expected for PSl. 

The presented algorithm for variability selection is sim- 
ple enough that it can be applied to large samples. Our 
IDL code (not optimized for speed) running on two dual- 
core CPUs takes 2 hours to characterize the ~ 12, 000 ob- 
jects of the griz color box with the PSl sampling. 

The extensive spectroscopy of quasars in S82 has en- 
abled stringent completeness and purity estimates for 
variability selection, at least for UVX quasars. Our anal- 
ysis has shown that variability-based quasar selection 
with only 6 epochs over 3 years (as for PSl) is effective, 
producing either complete or pure samples. However, 
something more like the ~60 epochs of S82 is needed to 
produce variability-selected quasar samples that are both 
complete and pure (with only weak color pre-selection). 

Therefore, it is paramount to eventually combine the 
variability information in different filters, in order to 
boost the number of epochs in PSl. We can attempt to 
do this by predicting time-offset synthetic r-band mag- 
nitudes from the g, i^ z and Y bands which could reduce 
the scatter in the A-7 plane, and so enhance the com- 
pleteness and the purity of the PSl sampled catalogs. We 
leave this important extension of our method to further 
work. 

Considerably further in the future, LSST (Ivezic et al. 
2008; Abell et al. 2009) will start operating and is 
planned to ultimately have ~200 epochs spread over 10 
years. This makes LSST optimal for creating complete 
and pure variability selected quasar samples. Consider- 
ing a UVX box like the one used here, a quasar can- 
didate variability selection with the LSST cadence and 
a 10 year basehne would definitely push both the com- 
pleteness and purity above the (lower) S82 limits of 90% 
and 95% shown in the top left plot of Figure 7 and in 
Table 5. Such samples are expected to be obtainable 
at least down to LSST's lOa limiting magnitude of 
around 23.3 Oguri & Marshall (2010), or even down to 
5(7 (^AB ^ ^^^^ epoch variability selection. 

If we further consider the fact that LSST will have a Ri- 
band, a LSST color-variability quasar selection will pre- 
sumably be able to create (UVX) quasar catalogs almost 
as pure and complete as spectroscopic samples. Hence, 
LSST win be superior to PSl in the 20,000 deg^ LSST 
will observe. However, until LSST happens, PSl is excel- 
lent for improving and further developing variability se- 
lection of quasars, and it will provide the first large vari- 
ability (and grizy-coloi) selected quasar samples. The 
LSST (and PSl) quasar catalogs will with estimated sizes 
of 100 quasars per square degree, or even more, continue 



14 



Schmidt et al. (2010) 



the wild growth of quasar catalog sizes seen the last 40-50 
years (Richards et al. 2009), and thereby stress that we 
are on the verge of an era where confirming the quasar na- 
ture of all objects in the catalogs by spectroscopic follow- 
up will be unfeasible. This makes the purity of the pho- 
tometric samples crucial for doing statistical analyses of 
the quasars. As demonstrated variability provides puri- 
ties above 90% for well sampled data at all redshifts and 
not only for UVX object and will therefore be an im- 
portant step in achieving very pure photometric quasar 
catalogs in the future. 

7. CONCLUSIONS 

We have presented a simple, parameterized variability 
characterization for sources with many epoch photom- 
etry. We do this by fitting a power-law model for the 
structure function of each source's light curve; the model 
is specified by an amplitude A and a power-law index 7. 

We have applied this approach to understanding 
variability-selection of quasars in multi-epoch, multi- 
color surveys, either augmenting or supplanting the more 
common color-selection. Specifically, we have analyzed 
data in SDSS Stripe 82 (S82) both to understand vari- 
ability selection per se and as a testbed for ongoing and 
upcoming surveys, such as PSl and LSST. To predict 
PSl's ability to identify quasars, we have down-sampled 
S82 data to a set of 6 epochs, resembling the expected 
single-band time sampling in the PSl Stt survey. 

For all sources in various sub-samples we calculated the 
parameters A and 7, and found that for sufficiently many 
epochs over a multi-year interval, quasars separate well in 
the A-^ parameter plane from non- variable sources and 
e.g. RR Lyrae. Quasar variability as typically character- 
ized by 0.07 < A < 0.25 and 0.15 < 7 < 0.5, consistent 
with earlier ensemble analyses. 

Drawing on the nearly complete spectral identifica- 
tion of quasars with i < 19.1 in S82, we have explored 
both the completeness and the purity of single-band, 
variability-selected quasar samples with both S82 and 
PSl time-sampling, and with or without the benefit of 
i^-band photometry (which PSl will not have). 

Specifically, we found the following: 

• Among the complete, spectroscopically confirmed 
sample of UV-excess quasars in S82, we can identify 
90% of them on the basis of ~60 r-band epochs over 
~5 years. This variability selected sample only has 
a 5% contamination from other objects. 

• Repeating the same exercise but with the 10-times 
sparser PSl sampling, reduces the completeness to 
73% but with still high purity (top right of Fig- 
ure 7). 

• In the redshift range 2.5 < z < 3, where quasars 
overlap with the stellar locus in color-color space, 
variability can find 90% (65%) of all spectroscop- 
ically confirmed quasars for S82 (PSl) sampling. 
This is a factor of 5-10 more complete than exist- 
ing color-selection in this particular regime (bot- 
tom panel of Figure 7). 

• To understand our ability to select quasars through 
their variability when no ii-band data are available, 
we selected all sources in a broad griz color box. 



known to contain almost all spectroscopically con- 
firmed quasars. In this color box, stars and other 
contaminants outnumber quasars by a factor of 30 
for 17 < i < 19.1. Nonetheless, variability selection 
encompasses 92% (75%) of known quasars for S82 
(PSl) samphng. For S82 sampling (Figure 8) the 
purity of this sample is still very high (92%), but it 
is rather lower with PSl-like sampling (Figure 9), 
30%. 

• If in the case of PSl (6 epochs, no ii-band) a more 
pure quasar sample is desired, this can be done by 
omitting the stellar locus in the griz box and then 
looking at the variability of the remaining sources. 
This yields a completeness of only 44%, but with a 
purity of 92%. 

• The high purity of the variability selected quasar 
sample in the griz box (with S82 sampling) con- 
firms that the fraction of overlooked quasars in S82 
must be small (< 10%); this inference is predicted 
on the assumption that the quasars "buried" in the 
stellar locus have a variability behavior similar to 
the others. 

• The same (A, 7) analysis is also very effective at 
identifying RR Lyrae. With S82 sampling 97% of 
the RR Lyrae from Sesar et al. (2010) are found, 
with PSl sampling still 83%; they appear as objects 
with (A, 7) ^ (0.2,0) (red points in Figure 5 and 
6). 

As mentioned in the introduction, new and larger 
quasar samples have many and varied interesting appli- 
cations. One of these is the possibility of finding lensed 
quasars. In this work we have only focused on point 
source selection, with the purpose of finding quasar can- 
didates. Such an approach would be directly applicable 
to a search for wide separation lenses, where (by defi- 
nition) the multiple images are well-resolved. Another 
very interesting use of variability selection, again with 
quasar lens finding in mind, is to look at spatially ex- 
tended objects rather than point sources. Applying the 
algorithm to a set of extended, yet quasar-colored, ob- 
jects will return a list of small separation quasar lens 
candidates. We have initiated such a search in Stripe 82 
with promising preliminary results, which we will present 
in a forthcoming paper. This search is a pilot for the PSl 
lensed quasar search, which will be able to be started af- 
ter about a year of survey imaging has been built up. 
We expect to find as many as 2000 lenses in this database 
(Oguri & Marshall 2010). Our analysis illustrates that 
combining color-selection of quasars with variability se- 
lection is a powerful approach to take. 
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APPENDIX 

A. INDIVIDUAL STRUCTURE FUNCTION PARAMETER INFERENCE BY MCMC 

We use a simple Markov chain Monte Carlo (MCMC) approach (e.g. Metropolis et al. 1953; Hastings 1970; 
Press et al. 1992; Hansen 2003) to infer the parameters A and 7 (Equation 3) and their confidence regions. Our 
MCMC procedure takes as input a catalog of magnitudes, photometric errors and observed frame MJDs, converted to 
N{N — l)/2 pairs of observations (Equation 1). We then initialize it as follows: 

• Pick a starting point for A and 7: we chose 0.1 for both 

• Define an initial Gaussian proposal distribution (PD) 

• Set an initial temperature P for the chain 

The PD width sets the mobility of the Markov chain. Based on several tests we set the initial PD width to 0.05 in 
both the log A and 7 directions. During a "burn-in" period at the start of sampling, we draw samples from a modified 
posterior PDF for the parameters, given by the product of the prior PDF, and the likelihood raised to the power 
of A. This parameter is an inverse temperature; we start from ^0 = ^ = 10~^. The A parameter is then increased 
geometrically to unity as 500 samples are drawn, at which point burn-in is declared over, the inverse temperature is 
fixed at A = 1, and the subsequent samples are stored and used to compute various statistics. For the post burn- in 
sampling, we use an updated Gaussian proposal distribution, whose widths are set to 10% of the standard deviations 
of the parameter (log A and 7) values sampled during burn-in. 

At each point in parameter space proposed, we calculate the (un-normalized) log posterior probability distribution 
of the step, which we define as 



logP = logP(A)+logP(7)-A^ . 
Here, is related to the logarithm of the likelihood defined in Equation 4 

-21og£ = I ^log(27rV;|,,,.) + ^ 



with the effective variability defined as in Equation 6: 



Kff.y =Knod(A7|A^„•)'■ 



(5Amf^. 



(Al) 



(A2) 



(A3) 



The sums in Equation A2 are over the data pairs derived from the light curve of the given object, and SArriij is the 
photometric error on the ij^^ magnitude pair. In Equation Al log P{A) and logP(7) represent the (log) prior PDFs 
for the parameters A and 7, which we chose to be uninformative. We assigned the following functional forms: 



P(7)(x 



1 + 7' 



(A4) 
(A5) 



If 7 is negative or A lies outside the range [0, 1], the log prior density for that parameter is set to —10^^, lowering the 
overall posterior probability for that particular iteration step to effectively zero. In this way we enforce our assumption 
that the power law exponent is positive and that the average variability on a 1 year timescale is less than 1 magnitude 
(as found by e.g. Vanden Berk et al. 2004; Bauer et al. 2009). Samples are accepted or rejected via the Metropolis- 
Hastings algorithm (Metropohs et al. 1953; Hastings 1970). Care is taken to make the comparison of the log posterior 
values at constant inverse temperature. 
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For our final A and 7 values, we choose to take the position of the global peak of the likelihood, an approximation 
of the "best-fit" point. We approximate this by keeping track of the value of the likelihood as we sample, and then 
using the sample with the highest value as our estimate. In practice the posterior PDF is not dominated by the prior, 
such that the peaks of the likelihood and the posterior PDF are usually quite close together. 

The uncertainties on the parameters are estimated by considering the 16*^ and 84*^ percentiles of the 1-dimensional 
marginalized distributions. In the case of a (symmetric) Gaussian distribution, this would correspond to the la error 
bar. 

To confirm our choice of initial conditions, cooling schedule and PD evolution as sensible, we tested our sampler 
on simulated light curves for both sine wave and Gaussian white noise sources, and recovered the correct parameters 
(zero 7 and analytically calculated A). 
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